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1 Introduction 

Power grids have long been a source of interesting optimization problems. Perhaps best known 
among the optimization community are the unit commitment problems and related generator dis- 
patching tasks, see [16]. However, recent blackout events have renewed interest on problems related 
to grid vulnerabilities. 

A difficult problem that has been widely studied, the N — K problem, concerns the detection of 
small cardinality sets of lines or buses whose simultaneous outage could develop into a significant 
failure event; see [6], [23] and references therein. This is a hard combinatorial problem which, unlike 
the typical formulations for the unit commitment problem, includes a detailed model of flows in 
the grid. A different set of algorithmic questions concern how to react to protect a grid when a 
significant event has taken place. This is the outlook that we take in this paper. 

In this context, the central modeling ingredient is that power grids display cascading behavior. 
A cascade is the process by which components of the grid (especially, power lines) sequentially 
become inoperative. In a catastrophic cascade this process accelerates (it 'snowballs') until the 
grid collapses. A control action must take this multi-step behavior into account because a myopic 
action taken at the start of the process so as to immediately arrest the cascade may prove far from 
optimal. 

The computation of an optimal control can be formulated as a multi-stage mixed-integer pro- 
gramming problem; ideally this should be a stochastic or robust formulation. Furthermore the 
formulation will need to include an explicit model of the power flows. When dealing with large- 
scale (or even medium-scale) grids it is likely that such a formulation will prove extremely in- 
tractable. In addition, it is likely that the prescribed control will call for counter-intuitive and 
possibly impractical actions. See Section 4 and Appendix A 

In this paper, building on prior models for cascades, wc consider an affine, adaptive, distributive 
control algorithm that is computed at the start of the cascade and deployed during the cascade. 
The control sheds demand as a function of observations of the state of the grid, with the objective 
of terminating the cascade with a minimum amount of demand lost. The optimization problem 
handled at the start of the cascade computes the coefficients in the affine control (one set of 
coefficients per demand bus). The discussion of this approach starts in Section 4.1; Section 5 
describes an initial set of experiments with a simple form of affine control. 

Most of our algorithms are ffist-order methods that compute local optima (see Section 6.1); 
however in a special case which is nevertheless of interest we obtain an exact algorithm that runs in 
polynomial-time (Section 6.2). Algorithms that account for stochastics are discussed in Section 8. 
We present numerical experiments with parallel implementations of our algorithms, using as data 
a snapshot of the U.S. Eastern Interconnect, with approximately 15000 buses and 23000 lines. 

2 Notation 

In the linearized approximation to the power flow problem, we are given a directed graph G with 
n buses and m lines (denoted, respectively, "nodes" and "arcs" in traditional graph theoretic 

language). In addition 
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• A line j is oriented from its tail, t{j) to its head, h{j). The orientation of any line is arbitrary 
and is simply used for notational convenience. The set of lines is denoted by A- 

• For each line j we are given two positive quantities: its flow limit uj and its reactance Xj. 

• We are given a supply-demand vector /3 G with the following interpretation. For a bus i, 
if /3i > then z is a generator (a source bus) while if /3j < then i is a load (a demand bus) 
and in that case — /3j is the demand at i. The condition '^iPi = is assumed to hold. For 
a generator bus i, we indicate by the constant Sj the maximum supply of i. We denote by Q 
denote the set of generators and by V the set of demand buses. 

The linearized power flow problem specifies a variable fij associated with each line j and a variable 
0i associated with each bus i. Denoting, for each bus i, the set of lines oriented out of (into) i by 
(5+(z) (resp., 5~{i)), the power flow problem consists in finding a solution to the following system 
of equations: 

E Z^ - E = ^busi, (1) 

ie<5+(i) (i)G<5-» 

^t{j) - (t>h{j) - Xjfj = V line j. (2) 

These equations can simply be abbreviated as 

Nf = /3, N'^4>-Xf = 0, 

where N denotes the bus-arc incidence matrix of G, and X = diagjxjj}. 

Remark 2.1 It can easily he shown that system (l)-(2) is feasible if and only if^i^xPi ~ ^ /'^'^ 
each component ("island") K of G, and in that case the solution is unique in the f variables. 

We stress that the orientation of the lines is arbitrary, consequently for a line j we might have 
fj < 0, indicating that power flows in the reverse direction. As a final point we note that the fiow 
limits Uij do not appear in (l)-(2); consequently, it is possible for the (unique) solution / to exceed 
the flow limits, i.e. it is possible that \fj\ > uj for certain lines j. 

3 Cascades 

In this section we introduce our model of cascading failures of power grids, which draws from the 
models in [9], [10], [11]. A cascade starts with an initial event, for example the removal of a number 
of lines, that alter and compromise the power grid. Informally, this is followed by a sequence of 
additional line outage events which are interspersed with simple control mechanisms as the grid 
adjusts to decreased resources, for example decreased generator capacity. 
As a starting point, we have the following template: 



Template 3.1 GENERIC CASGADE TEMPLATE 

Input: a power grid with graph G (post-initiating event). Set G^ = G. 
For r = 1,2,... do 

(comment: round r of the cascade) 

1. Set = vector of power flows in G^'. 

2. Set C = set of lines of C that become outaged in round r. 

3. Set = C - C. Adjust demands and supplies in C. 



In this template, graph G represents the grid we have after the initial event that causes the cascade. 
To make the template complete, we must specify a mechanism for determining the set C in Step 
2. and how to adiust demands and suDDlies in Steo 3. We tackle the second issue first. 



The supply/demand adjustment in Step 3 handles cases where the removal of from 
creates new components in C^'^^ . Any imbalance between supply and demand in a component 
of C"*"^ must be corrected: if, in a component of C"*"^, total supply exceeds total demand, then 
supply must be reduced, and viceversa. Such actions constitute a form of control. We will discuss 
this item in more detail in Section 4.1. 

Now we turn to step Step 2. We will say that line j is overloaded \x\ round r if |/J| > Uj. Practical 
experience shows that overloaded lines are likely to be come outaged. Two ways to implement this 
observation are: 

(F.l) Simple deterministic rule: j € C if \f^\/uj > 1 (alternatively, if |/J|/uj > 1). 

(F.2) Stochastic rule: for each line j there is a function 7j such that if \fj\/uj > 1 then j G 
with probability 'JjUfjl/uj). In [10], the function -jj takes a fixed value p. 

The two alternate forms of (F.l) are not strictly equivalent but we assume that from a practical 
perspective they are essentially identical. In any case, the use of (F.l) may not be desirable 
because it represents a strict numerical criterion that may be difficult to implement using standard 
numerical algorithms - a possibility is to use for Uj a slightly larger value than the true flow limit. 
The constant version of rule (F.2) can be criticized in that we would expect that higher overloads 
cause outages with higher probabilities; that is to say, we would expect that 7j(i) 1 as t ^ +oo. 
The problem of choosing, and calibrating such functions 7j is significant. 

Both rules can additionally be criticized on the grounds that they are memory-free; from a 
realistic perspective a line that was highly overloaded in round r — 1 should be more likely to be 
outaged in round r than one that was not. To address this issue, we assume that for any line j we 
are given a parameter < a j < 1 and define quantities /J by 

/; = aj\f^\ + {l-a,)f^-\ (3) 

with fj set to the absolute value of the flow on (j) prior to the incident that initiates the cascade. 

The /J quantities are then used instead of to obtain memory-dependent versions of rules (F.l) 
and (F.2). A variation of (3) is 

/J = «,.|/;| + (l-a,)|/;-^|. (4) 

The choice of the parameters Xj depends on the time scale of the cascade, but for robustness 
purposes the Xj should be treated as noisy. 

The memory-dependent versions of rules (F.l) and (F.2) can still give rise to non-smooth be- 
havior and ill-conditioning: for example, solving the power flow equations with different solvers can 
give rise to different cascades. In order to lessen this difficulty, we introduce an additional detail 
in choosing if a line becomes outaged. 



Rule 3.2 STOCHASTIC LINE OUTAGE 

Parameters: < < 1 for each round r. 
Notation: refer to Template 3.1 and equation (3). 
Application: For a line j in C: 

if Uj < f;, then j e 0^ (5) 
if (1 - er)uj < fj < Uj, 

then j G with probability ^, (6) 

if f;<il-er)uj, then j ^ 0\ (7) 



The random choice in (6) is an indirect way to incorporate some of the (poorly defined) "noise" 
mentioned above; additionally, from a mathematical perspective, it serves to smooth the cascade 
process. Typically we would have €i < 62 < • • ., indicating increasing uncertainty as the cascade 
progresses. If e,- = for all r we obtain the pure deterministic rule. 

Rule 3.2, and extensions, will be used later in our numerical experiments. 

4 Control Algorithms 

We consider control algorithms designed to stop the cascade after a fixed number of rounds with 
a maximum amount of total demand feasibly satisfied. In developing such algorithms we assume 
that the cascade is initially slow-paced so that significant computation is possible at time zero 
(immediately after the initiating event). While this may not be true for all cascades, it was true in 
the case of the 2003 cascade in the Northeast U.S. and Canada [25] with (arguably) on the order 
of one hour elapsing between subsequent outages at the start. Thus, wc assume that the algorithm 
is computed at time zero, with significant information available as to the state of the grid; the 
algorithm will be applied as the cascade progresses, with no further computation. We assume, as 
a control requirement, that there is a final round R in the cascade at the end of which no lines can 
be overloaded. 

The computation of an optimal schedule for demand shedding can be stated as mixed-integer 
optimization problem; see Appendix A. However, it is not clear that such an approach is either 
computationally feasible or even desirable (see the discussion in the Appendix). In this paper we 
will take a different outlook. 

4.1 Adaptive control 

We focus on robust control algorithms that take as input limited, real-time observations on the 
state of the grid and which prescribe simple control actions such as distributed load shedding (loss 
of demand). Our generic cascade template (3.1) is modified as follows: 



Template 4.1 CASCADE CONTROL 
Input: a power grid with graph G. Set = G. 
Step 0. Compute control algorithm. 

For r = l,2,...,i?- 1, do 

(comment: controlled round r of the cascade) 

1. Set = vector of power flows in . 

2. Observe state of grid (from state estimation). 

3. Apply control. 

4. Set = vector of resulting power flows in C. 

5. Set O""' = set of lines of G"" that become outaged in round r. 

6. Set = G"^ — O^' . Adjust loads and generation in C. 

Termination (round R). If any island of G^ has line overloads, proportionally shed 
demand in that island until all line overloads are eliminated.'* 

"The criterion of "stability" inherent in the termination step may obviously be incomplete when using 
a more complete model of power flows than the linearized model. 



In this template, steps 0, 2 and 3 are the only ones requiring controller actions. The remaining 
steps are due to the physics of the grid or underlying low-level automatic control steps. As discussed 
above, in this paper we assume that the cascade allows enough time for simificant computation to 



take place in step 0. One could conceive of a variant of the template where step is carried out in 
advance of an initiating event, thus obtaining a more general form of control. However, proceeding 
as in the template resolves an exponential number of potential outcomes, likely obtaining a simpler 
and faster step and a more effective control algorithm. 

Having applied the control in a given round, a given (pre-existing) component of C is likely 
to experience a supply/demand imbalance. This condition must be removed, which we assume can 
be undertaken through a low-level control mechanism. This is not a trivial assumption and if the 
imbalance is large the rebalancing may be deemed impossible with existing technology, resulting 
in the loss of all demand in that component. It is straightforward to implement such a "maxi- 
mum imbalance" fcatiirc in the above template; however for the sake of simplicity wc will assume 
that we can always rebalance supply and demand by proportionally decreasing the output of each 
generator in a given component (again, other rebalancing mechanisms are possible, giving rise to 
alternative versions of the template). Having effected the rebalancing, a new set of power flows 
will be instantiated: this is vector g'' in Step 4. Steps 5 and 6 now follow as in the generic cascade 
template. In Step 6, some of the (new) components of G^~^^ may have an excess of supply over 
demand or the other way around, and again we make the assumption that this excess is removed 
through a proportional scaling mechanism. 

To make the above template complete, we need to describe the type of control we have in mind, 
including what type of data observations it requires and what kind of control actions it specifics. 
In terms of the last item, many possibilities exist (including modifying the structure of the grid by 
e.g. shutting down lines, in which case the terminology in Steps 4-6 is not strictly correct) but in 
this paper we will focus on one type of action which is feasible in practice: "load shedding" or the 
controlled loss of demand. 

In the linearized (DC) power flow models we have variables of just two types: power flows and 
phase angles. In this paper we concentrate on control algorithms that observe real-time quantities 
related to flows. Two such quantities are: 

(a) The maximum overload: m8iXj^G^{\fj\/uj}. 

(b) The maximum relative flow variability: maXj^Qr-i{\fj — /J~^|/|/J~^|}, where we assume the 
maximum is taken over the lines with /j^ 7^ 0. 

From a practical standpoint, a relevant issue is how accurately (a) or (b) can be observed in real 
time, and whether such measurements can be disseminated to all buses of the grid. We assume that 
in early rounds of a cascade this is not a fundamentally difficult technological problem. Nevertheless, 
for a given integer 5 > wc define the radius-5 version of either (a) or (b), in which each bus of the 
grid is expected to perform the measurement over all links within 6 hops in the current topology. 
Note that even if S is large we are still constraining the measurement performed by a bus v to take 
place in the same component as v. We will discuss this topic in greater depth below. 

Putting aside this issue, we propose an affine control policy where at each round r < R, each 
demand bus v independently adjusts its demand by making use of a (precomputed) triple (c^, by, s'y) 
of parameters. 



Procedure 4.2 AFFINE CONTROL 

Input: a power grid with graph G (post-initiating event). Set = G. 
0. Compute triples (c^, b^, s^) for each r < R and v. 
For r = 1,2,..., i?- 1, do 

(comment: controlled round r of the cascade) 

1. Set = vector of power flows in C, and = the demand of any bus v. 

2. For any demand bus v, let be its data observation. 
Apply control: if > c^, reset the demand of v to 

min{l, [6; + ,;(c'- -<)]+}<. 

3. Adjust generator outputs in each component of G^ so as to match demand. 

4. Set = set of lines of C that become outaged as a result of the flows 
instantiated in Step 4. 

5. Set G^~^^ = G^ — C". Adjust demands and supplies in G^ . 

Round R. For any component K of G^, set = min|l , maxjgx{|//^|/'"j}|- 
If > 1, then any bus v of K resets its demand to d^/*^. 

In Procedure 4.2, round R handles cascade termination. Under the linearized power flow the 
rescaling guarantees that no line overloads will exist. Note that Step requires the computation of 
3{R — 1)D parameters, where D is the number of demand buses. Special cases of the control are: 

(1) Time- or bus-independent control: for any demand bus v, (c^,6^,s^) = (cu,6^,s^) for all 
1 < r < i? and some triple (c^,,, by, s^); or, for every demand bus v, (c^, 6^, s^) = {d', W , s^) for 
each 1 < r < i?, for a certain triple {d' ,¥ , s^). 

(2) Time-dependent, componentwise control. This is a control such that for any given component 
K of C, (c^, 6^, s^) equals a fixed triple (c^, 6^, s^) for every v ^ K. 

(3) Segmented control. Let (Si, S2, . . . , S//) be a partition of the demand buses. Then we insist 
that for any round r, (c^, 6^, s^) takes a common value for all demand buses in a given set Ej. 

An example of type (3) is that where the Sj are quantiles of the demand distribution. The resulting 
control is "fair" in that demands of similar magnitudes are reduced by similar fractions. Controls 
of type (2), in the case of the deterministic outage rule (F.l) can be explicitly described a-priori in 
polynomial space. This follows because in any fixed round r, C will have at most n components 
and these depend solely on the structure of the control on rounds up to r — 1. Thus in total at 
most Rn distinct triples (c^, 6^, s^) need to be specified. 

Our primary focus are on algorithms and implementations for the most general version (time- 
and bus-dependent controls) of our approach, using the outage rule (5)-(7) with memory. This is 
given in Section 6.1. In Section 6.2 we will discuss a special case where the optimal control can be 
efficiently computed. 

5 First set of experiments 

To motivate our overall approach, we first present experimental results using special, simple cases 
of the control. The objective of these experiments is to expose, first, the fact that in cases of severe 
contingencies the application of control so as to immediately stop a cascade may be myopic and 
could result in very suboptimal results. In other words, it may pay off to allow some lines to become 
outaged. However, this clashes with the intuitive notion that postponing action to much later in 
the cascade results in increasing uncertainty (because from the perspective of an agent at the start 
of the cascade, the later stages of the cascade should be more uncertain) . We want to measure the 



impact of uncertainty in the timing of control, and to contrast it with the goal of avoiding myopic, 
immediate action, as described above. 

5.0.1 The data 

In these experiments we used a snapshot of the U.S. Eastern Interconnect, with approximately 
15000 buses, 23000 lines, 2000 generators and 6000 load buses. The snapshot includes generator 
output levels, demands, and line parameters. 

Approximately 5000 of the line flow limits were zero, likely indicating a data error or missing 
data - the minimum nonzero line flow value was 3 x 10^^. When the flow limit of a line j was equal 
to zero, we proceeded as follows, where fj is the initial power flow value on line j: 

• If l/j^l > 10"^, we reset the flow limit to (1 + 7)|/°|, where 7 = 0.2. 

• Otherwise, we reset the flow limit to P = 10~^. 

A very small number of lines had positive and large capacity, but nevertheless the | fj \ values were 
close (or identical) to the line flow limits; in which case we increased the line flow limit by 25%. 

The reader may wonder about the impact of these numerical choices. Based on our limited 
testing with different choices of 7 and the impact is very minor in terms of p. The same holds 
for 7 unless we choose much values (on the order of 10^) and even then it is more a matter of degree 
than structure in the cascades we will describe below. 

Approximately 250 of the lines had negative reactance values. While there are valid reasons for 
the use of negative reactances, in the experiments below we assumed data error and replaced each 
negative value by its absolute value. 

5.0.2 Methodology 

In all the experiments the same approach was employed: first, we interdicted the grid according to 
a synthetic contingency, then we computed our affine control, and finally we studied the behavior 
of this control. 

To obtain contingencies we used the following methodology, which removes a a set of K random, 
high power flow lines from the grid, while preserving connectivity. Here -fC is a given, small integer, 
and as before m is the number of lines. 

(1) A spanning tree T is computed. 

(2) Let / denote the power flow vector corresponding to the given demands and generator outputs. 
Renumber so that |/i| > |/i| • • . > \ fm\- 

(3) Let < TT < 1. Run steps (a) and (b), initialized with 5 = 0, until stopping in (b): 
For j = 1, . . . ,m, 

(a) If Une j ^ T, then 

with probability vr reset S -f^ S U j. 

(b) lt\S\ = K, stop. 

(4) The set S of lines is removed from the network, producing network G in our cascade template. 
We used values of K ranging from 1 to 50, and for vr we used values ranging from .1 to .5. 

5.0.3 The experiments 

We considered a case with K = 2 (two lines removed) and R = 20 rounds. In the computation of 
the moving averages of line overloads (eq. (3)) we used a = 0.9. 

First we consider the pure deterministic case of line outages, that is to say we use line outage 
rule (3.2) with e,. = for all r. If no control is applied, then at the end of round of round 20 the 
vield foercentaee of demand still beine served) is 2.47%. The cascade is characterized bv extremelv 



high hne overloads; see Table 1 (we will discuss implications of this below). At the start of round 
1, in fact, the maximum line overload is 40.96, indicating that, likely, several lines with low flow 
limits are overloaded. 

We computed the best control where 

(i) = bl = 1 for all v and r. 

(ii) 8^ = for all v and 10 < r. Thus, no control is applied after round 10. 

(iii) For each 1 < r < 10, either = 0.005 for all -y, or = for all v. 

Thus we simply want to decide when to apply a control of a very simple form. Further, we are 

restricted to applying control in the first half of the cascade; this is done as protection against 
uncertainty in the later rounds of the cascade The rationale for the numerical values in (iii) is that 
1 + 0.005 * (1 — 40) « 0.80, that is to say, the application of this control in round 1 will "only" shed 
20% of the demand. 

We want to stress that the experiments in this section do not amount to a rigorous attempt at 
optimizing control. In fact, the control obtained through (i)-(iii) is only near-optimal. Instead we 
are trying to provide an example of the difference between an adequate control and the no-control 
option, and the questions that arise from the comparison. In particular, the amount 0.005 was 
arrived at through a simple grid-search process. 



Table 1: Cascade evolutions 
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In any case, the optimal control that satisfies conditions (i)-(iii) (and which we shall refer to 
as c20 for future reference) attains a termination yield of 75.2%, picks rounds 2 and 7 to apply 
control. Note that since the maximum line overload is high in round 1, c20 allows some lines to 
become outaged in round 1. 

This point is further elaborated in Table 1, where "r" indicates round and for each round, "k" 
indicates maximum line overload at the start of the round, "O" is the number of lines outaged 
during the round, "I" is the number of islands at the end of the round and "Y" is the (rounded) 
oercentaee of demand beine delivered end of the round. We stress that we count all islands. 



even those that consist of a single bus with no demand, and when computing the 
mEiximum Une overload we consider all lines, no matter how minor. 

Discussion. We see that initially both cascades have extremely high line overloads, many line 
outages and large amounts of islanding. However, under c20 after line 11 the overages are signifi- 
cantly smaller, and rapidly decreasing, and after round 8 the number of new outages and islands is 
also much smaller (and decreasing); both in spite of the fact that control is last applied in round 
7. Thus, effectively, the cascade has been "stabilized" under c20, long before the end of the time 
horizon. 

The reader might wonder about the rapid decrease of yield from 80% to 2% in the no-control 
case. This is due to the termination feature in our cascades that requires all line overloads to 
be eliminated by the end of the last round; since the no-control cascade has very high maximum 
overload (32.34), at the start of round 20, the termination rule forces a drastic reduction in yield. 

Nevertheless, in the no-control case, the combination of comparatively high yield (up to round 
4), high number of line outages, large line overloads and large amount of islanding suggest the 
possibility that many of the outages involve unimportant lines, and likewise with many of the 
islands (though of course a 22% yield loss should indicate a severe contingency). One wonders if 
somehow the no-control option might be attractive if enough time (i.e., rounds) were available. 

Table 2: Further evolution of no-control cascade from Table 1 
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To investigate these possibilities, we extended the no-control cascade. Table 2 shows the results 
for selected rounds. We see that the no-control approach finally yields stability by round 34, 
attaining yield 78%. This is slightly better (but very close) to what c20 obtained in 20 rounds 
(and, furthermore, control action under c20 was restricted to rounds 1-10). Nevertheless, the 
no-control approach experiences significant line overloads as late as round 32. 

By maintaining high overloads into very late rounds, the no-control strategy becomes more 
exposed to the unavoidable uncertainty that should be taken into account when modeling cascades, 
and which we have up to now ignored. We model noise by means of fault outage rule (3.2). In the 
following set of tests we assume that 

er = 0.01 +0.05* L^/IOJ. (8) 

Possibly, noise should be increasing at a faster rate than the above formula stipulates (perhaps 
exponentially). However, the control considered in Table 1 as well as the no-control approach are 

both exposed to significant amounts of noise after round 10; more so in the no-control case. We 
would thus expect that under rule (8) the no-control approach will perform much more poorly. 

To test these hypothesis, we ran 1000 simulations of cascades under rule (8) for the no-control 
case and for control using c20. The results are summarized as follows: using c20, the average yield 
is 42.90 and the standard deviation of yield is 27.47, whereas using no control the average yield is 
7.96 and the standard deviation is 9.33. In other words, c20 proves much more robust than the 
no-control strategy, which is not surprising given the structure of rule (8). A question that arises 
as a result is whether c20 is in some sense optimally robust. 

One way to investigate this question is to investigate controls that are less exposed to uncertainty 
by restricting them to a shorter timeline, i.e. by enforcing termination before round 20. For 
T = 10, 15, 25, we compute an optimal control required to terminate by round T, and otherwise 
subject to rules (i)-(iii), that is = 6^ = 1 for all v and r, = for all v and 10 < r, and for 
each 1 < r < 10, either = 0.005 for all v, or = for all v. We name these controls clO, cl5 
and c25, respectively. 



Table 3 presents the comparisons between all the options we have considered. In this table, 
"DetY" is the yield in the deterministic case (e^ = for all r), "MaxY" and "MinY" are the 
maximum and minimum yields in all the simulations (resp.), "AveY" is the average yield and 
"StddY" is the standard deviation of yield. 



Table 3: Robustness comparison - 1000 runs using stochastic outage rule (3.2) with 
noise as in (8) 



Option 


DetY 


MaxY 


MinY 


AveY 


StddY 


clO 


37.49 


39.05 


0.00 


11.81 


11.84 


cl5 


72.44 


71.85 


0.00 


33.94 


22.51 


c20 


75.19 


76.30 


1.17 


41.90 


27.47 


c25 


77.23 


42.34 


1.38 


11.99 


10.97 


no control 


77.75 


36.04 


0.00 


7.96 


9.33 



Control c20 emerges as superior over cl5 and clO. This can be explained as follows. Even though 
cl5 and clO are significantly less exposed to risk than c20, they are also restricted to operating, 
and terminating, during a stage of the cascade characterized by extremely high line overloads. 
Control c20, by being able to operate over 20 rounds, has "more time" while also avoiding the large 
uncertainty rounds 20 and higher. For this reason, c20 is also superior to c25 (their averages are 
separated by more than one standard deviation). One common feature that emerges in controls 
clO, cl5, c20 and c25 (not shown in the table) is that no control is taken in round 1, and control 
is taken in round 2 (and in the cases of clO, cl5 and c20, rounds 5 or 7). 

We stress that (8) is one categorization of noise. Using a different formula the outcome could be 
different, say cl5 could prove best. However, the outlook we are taking here is that by computing a 
robust control with respect to some rule such as (8) we obtain a control that remains robust (though 
possibly not optimally so) even if the model for uncertainty were to be somewhat changed. And, 
in any case, computing a control which is is somewhat robust should be better than completely 
ignoring uncertainty. 

To explore these issues, we study the following model 

€r = 0.01 + 0.005 * r, (9) 

which can be considered a smoothed version of (8). Under this model both cl5 and c20 are exposed 
to more noise than clO, and more noise than under rule (8). Consider Table 4. We see that c20 



Table 4: Robustness comparison - 1000 runs using stochastic outage rule (3.2) with 



Option 


DetY 


MaxY 


MinY 


AveY 


StddY 


clO 


37.49 


38.93 


0.00 


7.54 


9.55 


cl5 


72.44 


63.94 


3.41 


28.02 


17.94 


c20 


75.19 


73.04 


0.00 


32.24 


21.30 


c25 


77.23 


54.62 


0.25 


16.84 


12.66 


no control 


77.75 


18.86 


0.00 


5.11 


5.28 



still appears superior to the other controls, though cl5 is almost as good. 



The above experiments do not amount to a full optimal robust control computation. In Section 
8.1 we will return to these experiments from a stochastic optimization perspective. 



6 Optimization methods 

Given a control vector (c, 6, s), denote by 0^(c, 6, s) the final demand at termination of the R- 
round cascade controlled bv ic.h.s). Our eoal is to maximize Q^ic.h.s) over all controls. This is 



a nonconcave, in fact very combinatorial, maximization problem [7], [22]; it is very large (e.g. if 
i? = 10 the (c, b, s) vector has more than 180000 variables in the case of the Eastern Interconnect). 
It is also important to incorporate stochastics. 

In principle, the deterministic case of our problem could be tackled using mixed-integer pro- 
gramming techniques, and the stochastic version, using stochastic programming [21]. Of course, 
one could choose a different formulation of the cascade control problem than the one we chose 
(using a different kind of control, for example). But any formulation will have to deal with the 
combination of combinatorics in the network dynamics, multistage behavior, stochastics and very 
large size. In our opinion, this combination places the problem outside the capabilities of current 
optimization methodology, even in the deterministic case. We remind the reader that we envision 
our control as being computed in real time and we might only have one hour, or less, to do so. 

Another point to stress is that nonconcavity in a maximization problem leads to non-monotone 
behavior: in our case, just because a small change in control leads to an improvement does not 
imply that a larger change will result in greater improvement. 

6.1 First-order methods for the general case 

Here we describe a procedure to compute a control given by triples {cl,bl,sl), for each demand 
bus V and each round r, and using a control law as in Step 2 of algorithm (4.2). We will assume a 
semi- random outage rule as in (5)- (7), with memory, as in (3). As previously, the goal is to compute 
a control that will maximize the expectation of the amount of demand being served by the end of 
round R, which as before we denote by @^{c,b, s). We stress that the control parameters we use 
are state-independent; this is a design feature. 

Toward this goal we will use an algorithm based on the following template: 



Procedure 6.1 First-order algorithm 

Input: a control vector (c, 6, s). 
For A; = 1,2, ... do 

1. Estimate g = VG^(c, b, s). 

2. Choose "step-size" fi > and update control to (c, 6, s) -|- fJ-igc^ gb, 9s)- 

3. If IJ, is small enough, stop. 

This is a common first-order (steepest-ascent) method. In the deterministic case. Step 1 should 
be interpreted as a an approximate rule since is not differentiable (our stochastic outage rule 
3.2 does smooth out the expectation). The vices of procedure 6.1 are well known: even if 6^ were 
smooth, its nonconcavity implies that the steepest-ascent method may not converge to a global 
optimum. And even if @^ were smooth and concave, steepest ascent may zigzag or stall. See [22]. 

In summary, Procedure 6.1 should be viewed as a local search method with which to explore 
the neighborhood of a solution. Finally, in our setting the procedure could prove expensive, since 
each evaluation of (including in the estimation of VG^ through finite differences) requires a 
cascade simulation, each round of which requires two power flow computations in our setup. 

On the positive side, however, the procedure is flexible enough to handle (at increased compu- 
tational cost) important features, such as more realistic AC power flow models, or more complete 
renditions of low-level controls in the operation of a power grid. Essentially, Procedure 6.1 is an 
example of simulation-based optimization, i.e. it only needs to have a "black-box" that computes 
the function G^. 

An active research field that considers optimization under such assumptions is that of derivative- 
free optimization (see [13]) and related methods that incorporate second-order information [26]. In 
our estimation, these methodologies may not scale well to problems of the size we consider. In 
forthcoming work we will experiment on adaptations of these methodologies to our problem. 

When we consider a model that includes stochastics, the first-order method can be viewed as 
a stochastic gradients algorithm (see [24], [19] - an alternative methodology is provided by bundle 



methods). In the stochastic gradients approach, a fixed sample path of the appropriate random 
variables is chosen in advance of each gradient and step-length computation. In Section 8 we will 
further discuss this approach. 

Whether we use the stochastic setting or not, we cannot completely rely on Procedure 6.1 as the 
sole optimization engine - to repeat the above, the resulting algorithm would both be too slow and 
likely to get trapped in local maxima. To help avoid these difficulties we rely on several heuristics 
described later. In the next section we describe a special case of the optimal control problem that 
can be efficiently solved. 

6.2 The optimal scaling problem 

In this section we describe an algorithm that computes an optimal time-dependent componentwise 
control under outage rule (F.l), without memory. Either version of rule (F.l) can be used; for 
simplicity of language we will use the first. For brevity, we will refer to this as the simple scaling 
setting. Our algorithm computes an optimal control in time 0{m^~^ / {R — 1)!) where as before m 
is the number of lines. 

Remark 6.2 Consider an optimal control. Let 1 < r < R and let K he a component of G'' under 
the optimal control. Then (at round r) we will scale all demands in K by a common multiplier 
< < 1 (defined as in Procedure 4-2. Clearly, the control can be equivalently defined by the 
values Xj^ (< 1) rather than the triples (c^k^^k^^k)' ^''^^ '"^^ convention below. 

Notation 6.3 Let G be a graph, and let jj, be a supply-demand vector on G. We denote by f{G, n) 
the unique, feasible flow vector on G when fj, is the supply-demand vector (see Remark 2.1). 

In what follows we assume that we have a given supply-demand vector fi. Let R be the number 

of rounds for the cascade. Our problem is to compute a control that maximizes the total demand 
satisfied after R rounds, assuming that at the start of round 1, {5 is the supply-demand vector. We 
will solve this as a special case of a family of problems: 

Definition 6.4 For t > real, denote by Qq {t\(3) the final total demand resulting from applying 
an optimal control in an R-round cascade on graph G, where the initial supply-demand vector is 
tp. 

We wiU show that e)^'{t\P) is a nondecreasing piecewise linear function of t with at most Rn 
pieces. 

Remark 6.5 Let a be a supply-demand vector on graph G. Let t >0. Then f{G,ta) = tf{G,a). 

Lemma 6.6 Let fi be a supply-demand vector. Suppose G is connected. Then @Q\t\iJ,) is a 
nondecreasing piecewise-linear function of t with two pieces. 

Proof. Note that since R = 1, only Steps 1 and 2 in algorithm (4.2) will be executed. Further, 
writing / = f{G^, a), when running (4.2) starting with the initial supply-demand vector tfi, we will 
have f^ = tf in Step 1, and writing 'tp = maxj \fj\/uj, we have that maxj \fj\/uj = tip. Denoting 
by D the sum of demands implied by we have as per our cascade termination criterion that the 
final total demand at the end of i? = 1 rounds will equal 

tD, if t < l/ip, and (10) 
^ D = ^ D, otherwise. ■ (11) 

ti) i) ' ^ ' 

Now we turn to the general case with R > 1. We assume, without loss of generality, that G^ is 
connected. Let / = f{G^,P). 

Definition 6.7 A critical Doint is a real ^ > 0. such that for some line i. o'fi = -Ui. 



Recall that we assume Uj > for all j; thus let < 71 < 72 < . . . < 7p be the set of all distinct 
critical points. Here < p < m. Write 70 = and 7p+i = +00. 

Definition 6.8 Forl<i<p let F' = {j e A : jh\fj\ =Uj}. 

Now assume that the initial supply-demand vector is tp with t > and let < < 1 be the 
optimal multiplier used to scale demands in round 1 (see Remark 6.2). Write 

q = argmax{/i : jh <t}- (12) 

Thus, t < jq+i, and so X^t < 7^+1. We stress that these relationships remain valid in the boundary 
cases q = and q = p. 

Notation 6.9 Let the index i he such that \^t G (7j_i,7j]. 

Note that in Step 3 of round 1 of algorithm (4.2) we will scale all demands by A^, and since we 
assume is connected, in Step 4 we will also scale all supplies by A^. Thus, for any h < i — 1, 
and any line j e F^, we have that after Step 4 the absolute value of the flow on j is 

^'t\f,\>^H\fj\=uj, (13) 

and consequently j becomes outaged in round 1. On the other hand, for any line j ^ Uh<i-iF^ , 
the absolute value of the flow on j immediately after Step 4 is 

X't\fj\<^i\fj\<uj, (14) 

and so j does not become outaged in round 1. In summary, the set of outaged lines is Uh<i-iF^; 
in other words, we obtain the same network G'^ = — U^'l^F'* for every t with X^t G (7j_i, 7^]. 

Notation 6.10 For an index j, write /C(j) = set of components of G^ — uj^^^F'^. 

Let H E IC{i — 1). Then, prior to Step 6 of round 1, the supply-demand vector for H is precisely 
the restriction of X^tf3 to the buses of H, and when we adjust supplies and demands in Step 6, we 
will proceed as follows (where we use notation as in Section 2): 

• ifE s&vnni^'^^^l^s) > ^seGnni^^'^l^s) then for each demand bus s eVCiH we will reset its 
demand to 

where r = ^^s) ^ Es€gnij(^g) 

sevnH 

and we will leave all supplies in H unchanged. 

• likewise, if Ylse'DnH{~'t'^^^s) < YlseQnHi^^'^^s) then the supply at each bus s e Q Ci H will 
be reset to 

rXHPs, where r = -%^n^(^,), 

Z^s&gnH 

but we will leave all demands in H unchanged. 

Note that in either case, in round 2 component H will have a supply-demand vector of the form 
X^t(3^, where f5^ is a supply-demand vector. Thus an optimal control on H, on rounds 2,...,R, 
will yield a final total demand 

e^-^)(Ait|/3^), (15) 
which, inductively, is a nondecreasing function of Ai^, and therefore is largest when 

A^ = min(l,^l. (16) 



Case 1. Suppose i < q. As noted above, by definition (12) of q we have tliat < < t. Thus, 
the expression in (15) is maximized when Ai = y, and we obtain final (i2-round) demand equal to 

E <"'^(7^l/3''), (17) 
HeK.{i-l) 

which is independent of t. 

Case 2. Here q < i, and so i = g + 1 by definition of q and < 1. Thus (15) is maximized by 
setting A^ = 1. The final demand equals 



In summary, we have: 



max < 



max 
l<i<g 



E ef-'\t\p^). (18) 



E 0S?"^(7.i^^) , E Q^H-'\t\p^) \. (19) 

HeK.{i-l) I H&K{q) 



Theorem 6.11 (i) 0Q^^(t|/3) is nondecreasing, piecewise-linear, with at most 



breakpoints. 

(a) The optimal choice for A^ is A^ = 1 or A^ = ^yk/t for some k. 

Proof, (i) By induction on R, starting from Lemma 6.6. For the general step, consider the above 
discussion which assumes that X t G (7j-i,7i]. Then if Case 1 above holds, wc have that 6^^*1/5) 
is constant. And if Case 2 holds, then equation (19) applies. The form of (19) guarantees that, 
inductively, @q (^|/?) is nondecreasing piecewise-linear. 

( 

To analyze the number of breakpoints in 0^ , assume first that R = 2. Consider the effect of 
removing, one at a time, the lines of U^^^F^. Prior to its removal, each line j has both ends in the 
same component K; the removal either creates two new components (if j is a bridge of K) or creates 
a new component (which differs from K in that line j is not included). Thus the removal process 
can be represented as a binary tree whose leaves correspond to the components of — U^=i-F'^, 
i.e. the members of /C(p). Since these are disjoint there are at most n of them; since in a binary 
tree the number of degree three vertices is at most the number of leaves we conclude that 

I U^=i <m + n<2m + 2. 

Furthermore, let H he a. component in U^=x/C(/i). Define h = min{j : H £ /C(j)} and h' = max{j : 
H G By Lemma 6.6, it follows that will contribute at most one breakpoint to @^q\ 

and that this breakpoint will occur for some t with A-*^* G [7/1,7/1')- The maximum in (19) shows 
that for each q, one additional new breakpoint is created. Thus, in total, 9^^^ has at most 0{m) 
breakpoints and the result is verified for R = 2. 



In what follows we assume that R> 3. Suppose q = and thus i = 1. Since Ait < 71, it follows 
that no lines are outaged in round 1, i.e. = = G, and in subsequent rounds no line will be 

( R^ 

overloaded. Thus, in this case, e'^'{t\P) =tD and there are no breakpoints. For g > we proceed 
using (19). For each H G /C(g), inductively, G^~^^ has at most 

2 

"T-H , max{l,i?-3} 



breakpoints, where mn denotes the number of hnes in H and c > is a constant. So (19) implies 
that subject to i = q + 1, the number of breakpoints in Qq is at most 



1+ E 



m 



R-2 
H 



< 1 + 

< 1 + 



m 



max{l,il-3} 



{R-2)\ 

I \R-2 

jm - q) 
{R-2)\ 



■ + c[m-\ul^,F'^\) 

I / \max{l,R— 3} 

+ c[m — q) ^ ' 



^ N max{l,ii-3} 



(20) 



since ^\=\F^^ fl = for each H G /C(g). Summing this expression over all 1 < < p, we obtain 
that the total number of breakpoints is at most 



V 



m 

< m + ^ 

(m — 1) 



{m — q) 



R-2 



q=l 
R-l 



{R-2)\ 



+ c{m — q) 



max{l,ii— 3} 



< 



{R 



+ 0((m- 1)^-2)+ m + c^(m - q) 



max{l,i?— 3} 



(21) 



3=1 



For i? = 3 the last three terms in (21) are 0{m) and we are done as desired. For > 3, the last 
term in (21) equals 



c- — — z + 0(m^~3) 



R-2 

and again we conclude as desired for c large enough. 



(22) 



(ii) This follows from the discussion leading to eq. (19). ■ 

Part (ii) of Theorem 6.11 illustrates a weakness of the simple scaling approach - when applying 
an optimal control, at least one line becomes fully loaded at each round. Such a strategy is likely 
non-robust. We plan to address this issue in upcoming work; using the stochastic outage (F.2) and 
computing an appropriate optimal control. 

Despite the apparent shortcomings of the method, and of the simplicity of the proposed control, 
the ability to compute a global optimum in polynomial time (for fixed R) is a significant asset, 
especially as a starting point for the simulation-based methods for the general problem that are 
proposed below. In forthcoming work we will implement an appropriate version of the above 
algorithm; an relevant question is whether the worst-case bound in Theorem 6.11 is attained using 
realistic data. 



7 The algorithm 

Our algorithm implements Procedure 6.1 to implement an affine control as in Template (4.2), 
repeated here for convenience. The control specifies, for each round r of the cascade and each 
demand bus v, a triple {c^, b'^, s'^). At round r < i? of the cascade, each demand bus v observes the 
maximum line load in the component that v currently belongs to. Then, where d'^ denotes the 
current value of the demand at v, 

if k'^ > c^,, demand at v is reset to min{l, [6^ + s^(c^ — kJJ)]^} d^- (23) 

The "normal" case of such a control is that where b[, = c^, = 1, and sj^ > 0, which decreases demands 
in oroDortion to the maximum overload. However, cases with d. ^ 1 fdelaved or oroactive control) 



can prove optimal. Setting 6^ < 1 can result in nonsmooth (fixed-penalty) controls. Finally, setting 
< 0, though counterintuitive, can prove optimal when non-monotone behavior occurs. 

In order to initiate the gradient search method, we rely on grid-search, a standard enumerative 
idea: 

Grid search. Here we fix = 6^ = 1 for all r and and = for all v and all 2 < r < i?. 
Thus the only remaining parameters are for all u and r = 1,2. We restrict the search to two 
values and s^, and insist that for all v and 1 < r < 2 we have sj^ = . In our current imple- 
mentation, this two-dimensional search, in turn, is carried out one parameter at a time, as follows. 
Let Ik}- be the maximum line overload observed in the no-control cascade, during round 1. As- 
suming > 1, then we enumerate all choices for of the form = (.1 -f .008i)/{k^ — 1) for 
i = 0,1, . . . , 100. In other words, this enumerates all controls where in round 1 we scale demands 
by a factor of .9 — .008 i for i = 0, 1, . . . , 100. Let s\ < $2 be the two enumerated choices which 
produce the highest and second highest 9^ value. Then we repeat the search in the interval [si, S2] 
by enumerating = s\+i (sg — s})/100 for i = 0, 1, . . . , 100. The value that produces that highest 
value is our final choice for s^. We fix this value and now carry out the same type of search for s^. 

We will see below that grid search can produce very good control vectors, but which in general 
can be improved, sometimes significantly, by widening the search. One can use the control computed 
by grid search to start the general gradient search; however in high-dimensional cases even general 
gradient search itself can be quite slow as each gradient estimation step could prove very slow. 
This will not be the case if enough parallel computing resources are available; however and we have 
found an additional step to be useful: 

Segmented search. As introduced in Section 4.1, consider a fixed partition (Ei, S2, . . . , T,h) of 
the demand buses. We search, using the first-order method, for triples of the form (c[,6[,,§[) for 
each 1 < r < i? and 1 < i < i/, so as to obtain the control where for each 1 < r < R, and 
each demand bus v, {c'^,b^,s^) = (c[,6[,s[) if t; G Sj. In our implementation, the Sj are demand 
quantiles. That is to say, if L is the number of demand buses, then Si contains the IH/L\ buses 
with largest demand, 112 contains the next [H/L\ buses with largest demand, and so on. The 
advantage of this approach is that it considerably reduces the dimensionality of the problem, even 
if H is chosen relatively large, such as H = 100. In fact, the (segmented) first-order method runs 
quite fast, and the approach in [26] might also be practicable. Further, a segmented control is 
arguably 'fair' in that it specifies, to some degree, that similar buses are bound by similar control 
laws, though we stress that when applying the control (23) the actual demand reduction can be 
very different for two buses in the same segment but in different components. 

In the first set of experiments we have conducted, we have chosen H = 50 and we fix ¥■ = 1 
ioT 1 < r < R and 1 < i < H. Thus, altogether, we have 2H{R-1) = 100 {R - 1) variables, still 
large but much more manageable than full gradient search. In the experiments below, we forgo 
full gradient search; however it would be straightforward to follow up segmented search with full 
gradient search. 

7.1 Implementation details 

The parallel implementation of our algorithm relies on the familiar master-worker paradigm. Each 
worker performs computations of the function @^{c,b,s) for a given control (c, 6, s) whereas the 
master carries out the gradient search algorithm. In the experiments we report on here, we use 
the linear power flow model; linear programs are solved using Cplex 12.0 [18] and Gurobi 3.0 [15]. 
These solvers were used with all presolve options turned-ofF (this increased robustness). Further, 
the flow component in the solution to the linearized power flow system (l)-(2) is invariant under 
scaling of the X vector; we scaled all reactances so that the largest value was 100 (also for solver 
robustness). Interprocess communication in our algorithm uses Unix sockets. The computations 
below were performed on three eight-core i7 machines with 48GB of RAM each. 



7.2 Second set of experiments 

As before we use the Eastern Interconnect snapshot for our experiments. Synthetic contingencies 

were developed by removing K, random, highly loaded lines, as in Section 5.0.2. 

Our first set of experiments, shown in Table 1, concern cascades with R = 4 rounds. When applying 

rule (F.l), we used a = 0.55. As stated above, the segmented search was performed using if = 50 

segments. 

Table 5: Performance of algorithm on 4- found cascades 



K 


yield, 


yield, 


wallclock 




no control 


control 


(sec) 


1 


90.04 


95.03 


268 


2 


1.25 


50.13 


174 


5 


32.94 


81.05 


214 


10 


2.02 


36.97 


194 


20 


1.64 


27.84 


220 


50 


0.83 


16.96 


477 



In this table, the columns headed 'yield' indicate the percentage of total initial demand satisfied at 
the end of the cascade (without control, and using the computed control), and 'wallclock' is the 
observed parallel running time of the method. In each of these runs, the total number of gradient 
steps was small, typically smaller than 5. 

Note that in the case K = 1 the interdiction has limited effect, but even so the control is able to 
recover additional demand. In the case K = 5 the demand loss in the no-control case is substantial, 
but so is the benefit of the control. Finally, in the cases K = 2, 10, 20, 50 the network collapses 
but the control sill recovers a significant amount of demand. More experiments of this type will be 
forthcoming. 

In the next set of experiments we use the case K = 50 in Table 5 to investigate in more detail 
the behavior of the algorithm as R increases. We used a = 0.5 for all these experiments. Note that 
keeping a constant but increasing R effectively considers cascades that take longer from a 'real 
time' perspective, thereby giving more power to an agent applying control. If, instead, we were to 
increase R while also decreasing a, thus giving more weight to 'history', we would be able to model 
cascades that last for a fixed period of time, but where the individual rounds encompass shorter 
spans of time. 

Table 6 reports on the experiments. As before, 'yield' is the percentage of demand satisfied 
at the end of the cascade, using no control, the control obtained by grid-search, and the control 
obtained by segmented gradient search (started at the control computed by grid search). The two 
wallclock columns report, in seconds, the time used by grid- and gradient-search. In the case of 
grid search, we report the time spent on each of the two search steps (i.e., over rounds 1 and 2, 
respectively). The column labeled 'grad steps' reports the number of gradient steps. 

Table 6: Impact of increasing number of rounds on K = 50 case from Table 5 



R 


yield 


yield 


yield 


wallclock 


wallclock 


grad 




no control 


grid 


gradient 


grid 


gradient 


steps 


5 


4.13 


18.11 


31.86 


30 + 17 


1340 


7 


6 


2.02 


23.01 


25.86 


26 + 14 


657 


6 


7 


2.25 


25.10 


25.98 


33 + 15 


434 


3 


8 


0.78 


29.27 


46.97 


18 + 43 


3151 


10 



Next we will comment on Table 6. 



Computational workload 

Consider the case R = 8. Since we are using H = 50 segments, we have altogether 100 control 
variables c[ and s[ per round r. Since there are 7 rounds during which we will apply control, we 
have a total of 700 individual variables. Each partial derivative estimation requires two simula- 
tions; thus in total each gradient estimation entails 1400 cascade simulations. Per iteration, the 
step-size computations require 200 additional cascade simulations; for a total of 1600 simulations 
per iteration of Procedure 6.1. The case R = 8 required 10 gradient iterations, and thus in total 
16000 simulations. Each 8-round simulation (of the 15000-bus Eastern Interconnect, and using one 
core of the 17 CPU) requires, on average, 4.5 CPU seconds. This is primarily due to the two power 
flow computations per round, and linear solver data structure cleanup at the end of the simulation 
(and to a much lesser degree, to graph algorithms used to identify islands) . Thus in total the com- 
putation of the R = 8 case required approximately 72000 CPU seconds. Since we have 24 worker 
cores, this translates to approximately 3000 wallclock seconds. The balance of time with respect to 
the actual wallclock time in Table 6 (i.e., 151 seconds) is due to inter-process communication and 
networking delays, and logging of statistics to disc by the master. On a per-simulation, per-core 
basis, this amounts to 151 * 24/16000 0.22 seconds, or roughly 5% as compared to 4.5 seconds 
total per simulation. 

Grid-search vs gradient-search 

In several cases gradient search significantly improves on the grid search solution. This is especially 
noticeable in the R = 8 case, and we will examine this case in some detail. 

First, the grid-search control wc computed in this case uses (ci,6i,.si) = (1,1,0.00018), and 
{(f' ,b''" , s"^') = (1,1,0) for r > 2, effectively limiting control to the first round. In contrast, the 
gradient-search control we computed applies control as late as round 7 (which, as we have 8 rounds 
in total, is the last round for which we compute a control as per our control template (4.2)) , and 
within earlier rounds it applies different controls to different segments. In particular, in round 1 the 
gradient-search control uses the control vector (0.95, 1, —0.0499) for segment 1 (the highest demand 
segment) as well as two other segments, while for all other segments it uses (1,1,0.00018). And 
in round 7 it uses (1.1. 1,0.05) for segment 2, and (0.95, 1,0.05) for segment 3, while for all other 
segments it uses (1, 1,0). Other controls different from (1,1,0) are used in rounds 2 and 4, while 
on rounds 5 and 6 it uses (1,1,0) throughout. 



Table 7: Controlled cascade evolutions 
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1.00 
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Table 7 compares the cascade evolution under both controls. Here, we report the value, at the 
end of each round of: k. (the maximum line overload); the number of components of the network; 
and the yield, 'faults' is the number of line outages experienced during each round. Comparing the 
two evolutions, we see that the gradient-search control allows significantly more outages in initial 
rounds (as well as much more islanding). Nevertheless, the number of outages is (nearly) monoton- 
ically decreasing under gradient-search and by round 5 it is smaller than under grid-search. Thus 
the gradient search control appears to be maintaining high yield while carefully allowing outages 
to take place. During round 7, gradient-search makes a large improvement on the maximum line 



overload (from 13.27 to 1.01) but without losing much yield; this is evidence of yet more sacrificial 
line outages. 

Qualitative issues 

A comparison of the entry in Table 6 for R = 5 and those for R = 6,7 might appear to indicate 
that the controls computed for R = 6 and 7 arc locally optimal, because the control that achieves 
yield 31.86% for i? = 5 "should be" feasible for aU R>5. 

While it is true that the controls in Table 6 can all be improved upon, the argument in the 
above paragraph is not quite correct. Refer to our Cascade Control template 4.1. The termination 
step constitutes a last-recourse form of control - if there are line overloads at the start of the last 
round, loads are scaled so as to remove the overloads, and in that case the cascade is considered 
terminated, regardless of history (and of particular, of rule (3). We model termination this way 
on purpose, so as to provide an agnostic termination criterion that does not depend on numerical 
parameters of our model, in particular, a. Consider Table 8. 

Table 8: Maximum line overload at end of each round for K = 50 case from Table 6 
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In this table, the columns labeled "Cfc", for k = 5,..., 8 represent the controls in Table 6, 
whereas "None" means no control. The table shows, for each round, the maximum line overload at 
the end of that round, for each control option. We see that C5 reaches the start of the termination 
round, round 5, with maximum overload 1.7232; the current yield at the start of round 5 is 54.90% 
(not shown in the Table) and most of the demand is in one island. Hence the termination step 
win scale demands by 1/1.7232 and yield will drop to 54.90/1.7232 31.86, as Table 6 shows. As 
per our rules, this terminates the cascade, although since a = 0.5, and because the end-or-round 3 
maximum overload is very large, the maximum history-dependent line overload will be much larger 
than 1.732 (it should be at least 0.5 * 36.79 = 18.40. Hence control C5, if implemented in a cascade 
with 6 or more rounds, will not result in a stable state by the end of round 5. 

Another point that emerges from Table 8 is that C5 and Cs tend to maintain higher line 
overloads late into the cascade - this is a severe cascade, and having more time to apply control 
pays off. But by doing so C5 and Cg are likely less robust. Rather than performing the same 
robustness analysis as in Section 5, we will next consider the stability of the controls with respect 
to the a parameter in eq. (3) which in the above tests was set to 0.5. 

This is a delicate issue, because the value of a is related to the time duration of a round, and 
thus the structure of an optimal control should depend on a (in other words, how much time we 
have impacts what kind of control we can apply). The question is how stable a control remains as 
a is perturbed. 

In Table 9 we show the yields obtained by running the controls from Table 6 using their 
respective numbers of rounds, but using different values for a. We see that in terms of the deviation 
from the nominal case (i.e., a = 0.5), Cq and C7 prove the most stable, significantly less so and 
Ce is very unstable. It is still the case that C% remains best overall: this is due to the severity of 

the cascade. 

In Figure 1 we consider a broader experiment along the same lines. For this experiment we 
used a control intended for the i? = 8, K = 50 case considered in previous sections, and computed 
assuming a = 0.5 (as per the memory-dependent rule (3) for arc outages). 



Table 9: Stability of controls in Table 6 as a function of a 
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Figure 1 displays the actual yield as a is changed away from 0.5. We note that yield is rel- 
atively robust for a larger than but close to 0.5, but not if a is decreased. We believe that this 
behavior is due to application of our (deterministic) control results on lines becoming 100% loaded. 

Several heuristics, based on "padding" (increasing) or "tightening" (reducing) the flow limits uj, 
suggest themselves. However the greedy nature of deterministic controls will remain a fundamental 
difficulty. Section 8 discusses the computation of controls under stochastics. 

7.2.1 Additional tests 

The next set of experiments address basic conjectures that arise in the context of the type of control 
that we consider: 

• It is best to stop the cascade in the first round, i.e. to sufficiently reduce demands in the first 
round so as to eliminate all line overloads. 

• It is best to apply control in the first round only, and ride out the cascade for the remaining 
rounds. 

In fact it is a simple task to create small examples where both conjectures above are proved wrong. 

Instead we explore these questions using the Eastern Interconnect, with a random interdiction of 
the type described above with K = 50, three roTinds, and a = 1 (no memory, and thus we obtain 
outage rule (F.l)). The results of this experiments are as follows: 

• Using no control, after three rounds 0.63% of the demand is satisfied. 

• Grid search produces a control with yield 45.46%. 

• Starting from this control, and using segmented search with 50 segments improves yield to 
50.02%. 

To gain a different perspective on this cascade, consider Table 10, which shows the distribution of 
line overloads at the start of the cascade. 



Table 10: Distribution of line overloads 
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Where / denotes the flow vector at the start of the cascade, the table indicates the quantity 
of lines j whose (rounded-up) overload rifj|/iiil equals a particular value. Thus, 181 lines j 
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Figure 1: yield as a function of alpha 



are such that 1 < \fj\/uj < 2, 18 lines j are such that 2 < \fj\/uj < 3, and so on. One line 
has overload greater than 1504.93. We will provide a more detailed analysis of this case in the 
near future; however it is the case (as may seem plausible from the table) that in order to stop 
all overloads in round 1 a drastic reduction of demands is needed. We will instead describe the 
behavior of the optimal control computed by grid search has (cj,6j,s^) = (300.7,1,0.0004), and 
(c2,62,s2) = (1.26,1,0.62). 

Thus, in round 1, all demands will be scaled by a (approximately) factor of 1.0+0.0004* (300.7— 
1504.93) = 0.51831. Considering Table 10, we see that in round 1 all lines included in the columns 
with overload greater than 2 will be outaged - this is a total of 41 lines, and in fact three more 
with overload close to 2 are outaged. 

At the start of round 2, the maximum overload is approximately 1.36194. Thus, the control 
specifies that demands will be reduced by a factor of 1.0 + 0.62 * (1.26 — 1.36194) = 0.9368. This 
does not completely remove all overloads and an additional 4 lines become outaged. 

Finally, at the start of round 3, the maximum overload is 1.067891. By the rules of our cascade 
model, this overload is now removed by scaling down all demands. Altogether, we obtain a yield 
of 0.51831 * 0.9368/1.067891 = 0.4547, as stated above. 

8 Stochastic optimization 



To further motivate the need for stochastic modeling, we consider the same setup as for the experi- 
ment in Figure 1: ii = 8, = 50 and a = 0.5. We simulated the behavior of the computed control 



using the stochastic outage rule (5)-(7), repeated here for convenience. We are given a parameter 
< e < 1; if / is a flow vector, then line j is not outaged if \fj\ < (1 — e)uj, it is outaged if if 
l/jl > Uj, and is outaged with probability 1/2 if (1 — e)nj < fj < Uj. 

For various values of the tolerance e, we simulated the cascade 10000 times. For e < 0.03 little 
difference was observed with the nominal (deterministic) setting in that the average yield was close 
to (the deterministic yield of) 50%. For e = 0.03,0.10 and 0.20 the results are displayed in Figure 
2. 



COUNT 



Ol 

o 



o 
o 
o 



cn 
o 
o 



ro 
o 
o 
o 



O 

o 



ho 

-< Ol 

m 
|— 
o 

o 

CO 



l\3 

O O 00 



Figure 2: yield histogram under stochastic outages 



The figure shows, for each value of the yield (and for each of the three choices for e) how 
frequently that yield was observed. Note that for e = 0.03 the distribution is essentially trimodal. 
This is typical behavior and it points to a small number of critical lines, which, in turn, result in a 
small number of cascade trajectories being overwhelmingly likely. For e = 0.20 yields close to zero 
are observed, but, significantly, the average is positive. 

In a stochastic setting, the yield 0^(c, b, s) resulting from a control (c, b, s) is a random variable. 
Below we discuss different methodologies for computing a (locally) optimal stochastic control, with 
the obiective of maximizine the exoectation E^B^). Other merit criteria are also of interest (and 



possibly better), such as a linear combination of expectation and variance E(0^) — Avar(0^) 
(A > 0), or a Sharpe-ratio quantity E(0^)/var(0^). 

The computational challenges inherent in any of these tasks are significant. First, as displayed 
in Figure 2, yield variances can be extremely large. From a theoretical perspective, the number 
of samples needed to obtain reliable estimates become inordinately large. Additionally, there are 
subtle difficulties due to the non-monotonic behavior of power flow systems (see [6]), which is 
reminiscent of Braess' paradox [8]. 

An example of this behavior is provide by our stochastic outage rule (3.2). Note that under this 
rule, a line is more likely to become outaged than under the deterministic rule (i.e., when {l—er)uj < 
fj < Uj the line may become outaged in the stochastic setting; not so in the deterministic setting). 
Nevertheless, one can produce cases where the deterministic yield of a control is smaller than a 
sample yield of the same control under rule 3.2. This phenomenon slows down convergence of our 
algorithms and makes algorithm calibration difficult. 

8.1 Optimization methods through simulation 

In cither the first-order procedure (6.1) discussed above, or in the grid-search setting, wc obtain 
a counterpart valid for a stochastic setting by replacing each (deterministic) evaluation of a yield 
0^(c, b, s) by an estimation of E 6^(c, b, s). 

This is the approach used in the next set of experiments, which parallel those in Section 5. For 
convenience, we restate the setup for these tests. 

First, two random, though adversarially chosen lines were removed from the grid. Next, we 
compute the best control such that 

(i) c\, = bl = 1 for all v and r. 

(ii) = for all v and 10 < r. Thus, no control is applied after round 10. 

(iii) For each 1 < r < 10, either = 0.005 for all -y, or = for all v. 

This was done, in the deterministic setting, while setting the maximum number of rounds, R, to 

10, 15 20 and 25. In Section 5, we observed that the cascade is characterized by high line overloads 
during the initial rounds; nevertheless if "enough" rounds are allowed (34) then without control the 
cascade stabilizes and produces approximately 78% yield, and this was slightly superior to what 
was obtained by the controls we computed. In summary, this case provides a good contrast between 
the (opposing) need to "wait out" an initially severe cascade on the one hand, with uncertainty 
growth as we increase the number of rounds. Using the framework of stochastic outage rule (3.2), 
with 

€r = 0.01 0.05* [r/lOj, (24) 

wc observed that the (deterministic) 20-round control appeared superior. 

In this section wc will repeat these comparisons, except that now we will compute controls of 
the form (i)-(iii) that (approximately) maximize the expected yield. We compute such controls by 
modifying our grid search: we evaluate a control vector by simulating 20 cascades and computing 
the sample average yield. This can be a somewhat coarse approximation because, depending on 
the model for noise, many more than 20 samples may be needed for an accurate answer since the 
variance of yield can be high. 

For R = 10, 15,20, denote by kR the control computed by the algorithm. Table 11 shows the 
results of our experiments. In the table, for each R, 'ave kR' and 'std kR' arc the estimates of the 
average and standard of the yield of kR using A'^ = 20 and N = 1000 samples. Finally, 'ave cR' and 
'std cR' are the 1000-sample average and standard deviation of yield of the deterministic controls 
c20, cl5, clO, computed in Section 5 (as in Table 4). 

We see that the kR controls are uniformly superior to their cR counterparts, sometimes by 
almost one standard deviation. We observe that k20 is superior to kl5, and much superior to klO. 
This Darallels observations made in Section 5. 



Table 11: Stochastic grid search results in case from Section 5 
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8.2 Stochastic gradients 

The stochastic gradients method is a well-known approach for solving optimization problems with 
uncertainty. Because of the nonconcave nature of the yield maximization problem, in our case it 
will amount to a local search method. Furthermore, there arc some difficulties that are caused 
by the nonsmoothness in our models. We will only outline here how we are approaching these 
difficulties. 

The core step in the stochastic gradients approach is to (randomly) sample a cascade, and, 
keeping the cascade fixed, to compute the impact on yield of infinitesimally small changes in the 
control parameters. This is followed by a line search to optimize the step size. This basic step is 
repeated. 

A difficulty that we encounter when we attempt to make this outline more specific is that yield 
is not a differentiable function of the control parameters, for several reasons, the main one being 
the stochastic outage protocol (3.2), which, while smoother than a completely deterministic rule, 
is not smooth enough, due to its abrupt transition between stochastic and deterministic regimes. 

We modify rule (3.2) so that the probability of a line will outage is always strictly positive and 
strictly smaller than 1; however when the overload is larger than 1 the outaging probability will be 
very close to 1, and when the overload is significantly less than 1 the outaging probability (while 
positive) will be very small. To this effect consider a function 

F : M+ ^ [0, 1), with F(0) = and F{x) 1 as x ^ +oo, 

where the convergence is very rapid. An example is F[x) = 1 — e~^^, for large M > 0. Likewise, 
consider a function 

G : [0, 1] ^ [0, 1), with G(0) = 1 and G{x) ^0 as x ^ I, 

and again with rapid convergence. An example is G{x) = e~^^ for large M > 0. 

Having chosen F and G, wc modify outage rule (3.2) as follows. At round r, and given a 
tolerance > < 1, line j is outaged 

' |G (l - /;/(l - e,)) , if/;<(l-e,)t.,- 

if (1 - er)uj < /J < Uj (25) 
. + - l)), iiuj<f;. 



with probability 



1 

2' 



In other words, if /J > Uj, the outage probability is very large, but is bounded strictly away from 
1, and if /J < (1 — er)uj the outage probability is very small but remains strictly positive. By 
choosing F and G appropriately we obtain an outage model that is arbitrarily close to rule (3.2). 

Another source of nonsmoothness in our models is the general form our control law in Step 2 of 
Procedure (4.2). However, it is easy to see that the law can be approximated (arbitrarily closely) 
using a smooth control. 

The computation of the (stochastic) gradient of the yield function at a given control vector 
(c. b. s) can now be described. First, we samole a random cascade under the control (c. b. s) and 



outaging lines using rule (25). This produces a particular sequence of lines that become outaged; 
i.e. at round r a certain set S"" of lines is outaged, for r = 1,2, . . . , R — 1. 

Next, we compute the change in yield that results when we perturb the control by a vector 
(e*^, e**, e'') with infinitesimally small entries, while still assuming that set S"" is the set of lines 
outaged at round r, for each r. This is a deterministic computation; rule (25) guarantees that 
the given cascade structure retains positive probability. This computation gives us the stochastic 
gradient. 

However, at this point we need to deal with the final reason that the yield function is not 

smooth, and this is the demand/supply adjustment in Step 3 of our generic cascade template (3.1) 
(or in Step 6 of the cascade control template (4.1)). If, at round r, under control (c, b, ,s) a certain 
component K has equal demand and supply, then no adjustment takes place. However, even a 
small change in the control that results in shedding less demand by round r will not result in an 
increase of yield, whereas the opposite change in control will possibly result in a decrease in yield. 
Thus, in essence, a left derivative is different from the right derivative; and moreover this is not a 
probability zero event. 

Nevertheless, it is still possible to adjust the stochastic gradients framework so as to recover a 
valid first-order method. The resulting approach is related to the classical Prank- Wolfe method [3]. 
In forthcoming work we will report on experiments with this approach. 

9 Upcoming work 

Our forthcoming work will focus on three areas: stochastics or robustness, in particular concerning 
the optimal scaling problem in Section 6.2, an investigation of game-theoretic aspects of the type 
of control we study, and the use of AC power flow models. 

With regards to the last point, a recent paper of Lavaei and Low [20] may yield a robust solver 
for AC power flow systems under severe contingencies. Even though the work in [20] relies on 
semi-definite programming, in fact one of the algorithms can be restated as a second-order conic 
program, which may be more efficient. 
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A Appendix - Formulations for the Optimal Demand Shedding 
Problem 



We will first describe mixed-integer programming formulation for computing an optimal schedule 
for demand shedding, under deterministic line outages. In the formulation below our variables have 
the following interpretations: 

• fj is the flow on line (j) during round r and ttJ (i/J) is the positive (resp., negative) part of 

J 3 

• ^[ is the phase of bus i during round r 

• for a demand bus i, d\ indicates its demand during round r 

• for a generator bus i, s[ indicates its supply during round r 

• for a line j, the variable takes value 1 if arc j becomes outaged during round r (and it 
takes value otherwise). 

• for a line j, the 0/1 variable takes value 1 if /J > 0; likewise takes value 1 if /J < 

For a demand bus i, we indicate by the constant dj its demand at the start of the cascade. Let D 
denote the sum of all such quantities dj. The formulation is as follows: 



max 

( ieg 

Subject to: ^ fj - ^ fj = i -dl i ^ yi<r<R (26) 

j&s+{i) jeS-{i) [ otherwise 

/J = TT^. - ly^. y j eAandl<r <R (27) 
vr]; < Dp'j, u] < Dn], V j € ^ and 1 < r < (28) 

r-l 

+ = 1 - ^ yj^, V i G ^ and 1 < r < i? (29) 

h=i 

ttJ + i/J - Uj < Dy] V j G ^ and 1 < r < i? (30) 

vr} + i^J > tijy} V j G ^ and 1 < r < ii; - 1 (31) 

7rf + z^f <Uj^jeA (32) 

r-l 



- x,/J| < M, Vi G ^ (33) 

h=i 



Q<sl<Si^ieg, 0<(fi <di yieV, (34) 
Pj, <j, y!; = or 1, V j G ^ and 1 < r < (35) 

< ttJ, < i/J, V i G ^ and 1 < r < i?. (36) 

In this formulation, (26) is a flow balance constraint: it specifies that during round r each generator 
i outputs s[ units of flow, and similarly with demand buses. Constraints (27)-(29) together with 
the fact that pj and are 0/1 variables, guarantee that ttJ (uj) is the positive (resp., negative) 
part of /J, and that furthermore /J = if line j was outagcd at a round prior to r. Constraint (30) 
guarantees that if y!^ = then < Uj and (31) guarantees that if y!;' = 1 then > Uj. Thus, 
we obtain a mix of the two alternative versions of rule (F.l). 

Constraint (32) indicates the desired termination condition in round R. We note that (33) 
involves an absolute value but is easily replaced by two standard linear inequalities. The quantity 
Mj is assumed to be a "large enough" quantity (see [5] for a related discussion). 

Other formulations are possible, and in particular it is easy to enforce additional rules constrain- 
ing how demand can be shed. The formulation can also be adapted to use the memory-dependent 
outage models (3) or (4). By adapting constraints (30) and (31) one can model the stochastic rule 
(F.2), obtaining a (mixed-integer) stochastic program; the underlying uncertainty is primarily of 
an endogenous nature; see [14]. 

A. 0.1 Discussion 

The above problem, even in its simplest, deterministic form is likely quite nontrivial. Constraints 

(30) and (31) are essential in modeling the outaging of lines; similar constraints arc used in for- 
mulations for the classical N — K problem (see [5], [6]) and contribute to make the problem very 
difficult. Note that we would need to handle cases with n > 10"^, m > 2 x 10"^ fand R> 2). 



A larger concern involves the fact that the optimal solution is likely to entail very complex 
control strategies that may be difficult to implement. All the approaches we discussed above, 
including the stochastic programming versions of the formulation, are likely to specify very precise 
actions in each round (and scenario) which may be problematic in practice in what likely would be 
a very "noisy" environment. 

Nevertheless, the study of the formulation may still prove a very worthwhile exercise, one that 
could highlight underlying weaknesses of the models and hidden vulnerabilities in the operation of 
the grid. 



